Local case-control sampling について

Words near each other

・ Local boards formed in England and Wales 1848–94
・ Local bodies in Tamil Nadu
・ Local body elections in Tamil Nadu
・ Local Body Tax
・ Local border traffic
・ Local boundedness
・ Local Boy
・ Local Boy in the Photograph
・ Local Boy Makes Good
・ Local Boys
・ Local Bubble
・ Local bus
・ Local Business
・ Local call
・ Local Capital Finance Company
・ Local case-control sampling
・ Local chief executives
・ Local Choice Spirits
・ Local church
・ Local church (disambiguation)
・ Local Church controversies
・ Local churches (affiliation)
・ Local class field theory
・ Local cohomology
・ Local Colleges and Universities (Philippines)
・ Local collision
・ Local color
・ Local Color (album)
・ Local Color (book)
・ Local Color (film)

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Local case-control sampling ：ウィキペディア英語版

Local case-control sampling
In machine learning, local case-control sampling is an algorithm used to reduce the complexity of training a logistic regression classifier. The algorithm reduces the training complexity by selecting a small subsample of the original dataset for training. It assumes the availability of a (unreliable) pilot estimation of the parameters. It then performs a single pass over the entire dataset using the pilot estimation to identify the most "surprising" samples. In practice, the pilot may come from prior knowledge or training using a subsample of the dataset. The algorithm is most effective when the underlying dataset is imbalanced. It exploits the structures of conditional imbalanced datasets more efficiently than alternative methods, such as case control sampling and weighted case control sampling.
== Imbalanced datasets ==
In classification, a dataset is a set of ''N'' data points

(x_i, y_i)_^N

, where

x_i \in\mathbb R^d

is a feature vector,

y_i \in \

is a label. Intuitively, a dataset is imbalanced when certain important statistical patterns are rare. The lack of observations of certain patterns does not always imply their irrelevance. For example, in medical studies of rare diseases, the small number of infected patients (cases) conveys the most valuable information for diagnosis and treatments.
Formally, an imbalanced dataset exhibits one or more of the following properties:
* ''Marginal Imbalance''. A dataset is marginally imbalanced if one class is rare compared to the other class. In other words,

\mathbb(Y=1) \approx 0

.
* ''Conditional Imbalance''. A dataset is conditionally imbalanced when it is easy to predict the correct labels in most cases. For example, if

X \in \

, the dataset is conditionally imbalanced if

\mathbb(Y=1\mid X=0) \approx 0

and

\mathbb(Y=1\mid X=1) \approx 1

.
== Algorithm outline ==
In logistic regression, given the model

\theta = (\alpha, \beta)

, the prediction is made according to

\mathbb(Y=1\mid X; \theta) =  \tilde_(x) = \frac

. The local-case control sampling algorithm assumes the availability of a pilot model

\tilde =  (\tilde, \tilde)

. Given the pilot model, the algorithm performs a single pass over the entire dataset to select the subset of samples to include in training the logistic regression model. For a sample

(x,y)

, define the acceptance probability as

a(x,y) = |y-\tilde_(a(x_i,y_i))

for

i \in \

.
# Fit a logistic regression model to the subsample

S = \

, obtaining the unadjusted estimates

\hat_S = (\hat_S, \hat_S)

.
# The output model is

\hat = (\hat, \hat)

, where

\hat \leftarrow \hat_S + \tilde

and

\hat \leftarrow \hat_S + \tilde

.
The algorithm can be understood as selecting samples that surprises the pilot model. Intuitively these samples are closer to the decision boundary of the classifier and is thus more informative.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Local case-control sampling」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース